Privacy-preserving evaluation of decision trees

ABSTRACT

A method for performing a secure evaluation of a decision tree, including: receiving, by a processor of a server, an encrypted feature vector  x =( x 1   , . . . ,  x n   ) from a client; choosing a random mask μ 0 ; calculating  m 0    and sending  m 0    to the client, wherein  m 0   = x i     0       (0)   −t 0   (0) +μ 0    and t 0   (0)  is a threshold value in the first node in the first level of a decision tree  ′; performing a comparison protocol on m 0  and μ 0 , wherein the server produces a comparison bit b 0  and the client produces a comparison bit b′ 0 ; choosing a random bit s 0 ∈{0,1} and when s 0 =1 switching a left and right subtrees of  ′; sending b 0 ⊕s 0  to the client; and for each level  =1, 2, . . . , d−1 of the decision tree  ′, where d is the number of levels in the decision tree  ′, perform the following steps: receiving from the client  y k    where k=0, 1, . . . ,  −1; performing a comparison protocol on   and  , wherein   is a random mask and   is based upon,  x ,  ,  y k   , and   and the server produces a comparison bit   and the client produces a comparison bit  ; choosing a random bit  ∈{0,1} and when  =1 switching all left and right subtrees at level   of  ′; and sending  ⊕  to the client.

TECHNICAL FIELD

Various exemplary embodiments disclosed herein relate generally to a method and apparatus for performing a privacy preserving evaluation of decision trees.

BACKGROUND

Protocols have been developed for comparing private values using homomorphic encryption. These protocols may be used in the evaluation of decision trees. Embodiments improving upon the state of the art will be described below.

SUMMARY

A brief summary of various exemplary embodiments is presented below. Some simplifications and omissions may be made in the following summary, which is intended to highlight and introduce some aspects of the various exemplary embodiments, but not to limit the scope of the invention. Detailed descriptions of an exemplary embodiment adequate to allow those of ordinary skill in the art to make and use the inventive concepts will follow in later sections.

Various embodiments relate to a method for performing a secure evaluation of a decision tree, including: receiving, by a processor of a server, an encrypted feature vector

x

=(

x₁

, . . . ,

x_(n)

) from a client, where n is an integer, wherein

x

denotes an additively homomorphic encryption of x using a cryptographic key of the second party; choosing a random mask

₀; calculating

m₀

and sending

m₀

to the client, wherein

m₀

=

x_(i) ₀ ₍₀₎ −t₀ ⁽⁰⁾+μ₀

and x_(i) ₀ ₍₀₎ is the value of entry i₀ ⁽⁰⁾ of the feature vector x (1≤i₀ ⁽⁰⁾≤n), and t₀ ⁽⁰⁾ is a threshold value in the first node in the first level of a decision tree

′; performing a comparison protocol on m₀ and μ₀, wherein the server produces a comparison bit b₀ and the client produces a comparison bit b′₀; choosing a random bit s₀∈{0,1} and when s₀=1 switching a left and right subtrees of

′; sending b₀⊕s₀ to the client; and for each level

=1, 2, . . . , d−1 of the decision tree

′, where d is the number of levels in the decision tree

′, perform the following steps: receiving from the client

y_(k)

where k=0, 1, . . . ,

−1; performing a comparison protocol on

and

, wherein

is a random mask and

is based upon

x

,

,

y_(k)

, and

and the server produces a comparison bit

and the client produces a comparison bit

; choosing a random bit

∈{0,1} and when

=1 switching all left and right subtrees at level

of

′; and sending

⊕

to the client.

Further various embodiments relate to a non-transitory machine-readable storage medium encoded with instructions for performing a secure evaluation of a decision tree, including: instructions for receiving, by a processor of a server, an encrypted feature vector

x

=(

x₁

, . . . ,

x_(n)

) from a client, where n is an integer, wherein

x

denotes an additively homomorphic encryption of x using a cryptographic key of the second party; instructions for choosing a random mask μ₀; instructions for calculating

m₀

and sending

m₀

to the client, wherein

m₀

=

x_(i) ₀ ₍₀₎ −t₀ ⁽⁰⁾+μ₀

and x_(i) ₀ ₍₀₎ is the value of entry i₀ ⁽⁰⁾ of the feature vector x (1≤i₀ ⁽⁰⁾≤n), and t₀ ⁽⁰⁾ is a threshold value in the first node in the first level of a decision tree

′;

instructions for performing a comparison protocol on m₀ and μ₀, wherein the server produces a comparison bit b₀ and the client produces a comparison bit b′₀; instructions for choosing a random bit s₀∈{0,1} and when s₀=1 switching a left and right subtrees of

′; instructions for sending b₀⊕s₀ to the client; and for each level

=1, 2, . . . , d−1 of the decision tree

′, where d is the number of levels in the decision tree

′, perform the following instructions: instructions for receiving from the client

y_(k)

where k=0, 1, . . . ,

−1; instructions for performing a comparison protocol on

and

, wherein

is a random mask and

is based upon

x

,

,

y_(k)

, and

and the server produces a comparison bit

and the client produces a comparison bit

; instructions for choosing a random bit

∈{0,1} and when

=1 switching all left and right subtrees at level

of

′; and instructions for sending

⊕

to the client.

Various embodiments are described, wherein when the server engages in a 1-out-of-2^(d) oblivious transfer with the client to learn the value of

′(x).

Various embodiments are described, wherein when the client computes r=(β₀, β₁, . . . , β_(d−1))₂ which is the index of the leaf node in

′ indicating the output of the decision tree and where

=

⊕

⊕

.

Various embodiments are described, wherein the first encryption uses the Pallier cryptosystem.

Various embodiments are described, wherein performing a comparison protocol on

and

, further includes: choosing, by the server,

random masks

for 0≤k≤

−1, and a random mask

; computing

=

; computing for 0≤k≤

−1

where

=

−

+

; and sending

and

, . . . ,

to the client.

Various embodiments are described, wherein performing a comparison protocol on

and

, further includes: decrypting, by the client,

to get

; and computing, by the client,

=

y_(k) z_(k)−

.

Various embodiments are described, wherein performing a comparison protocol on

and

, further includes: choosing, by the server, a random mask

; computing <<

>> where

=

+

(

−

), <<w>> denotes a second homomorphic encryption of w using a cryptographic key of the second party; and sending <<

>> to the client.

Various embodiments are described, wherein performing a comparison protocol on

and

, further comprises, decrypting, by the client, <<

>> to get

.

Various embodiments are described, wherein the second encryption uses the Boneh-Goh-Nissim (BGN) cryptosystem.

Various embodiments are described, wherein the second encryption uses a somewhat homomorphic cryptosystem.

Various embodiments are described, wherein for each level

=1, 2, . . . , d−1 of the decision tree

′ the client computes y_(k)=1{

=k}, where

=

, encrypts y_(k) resulting in

y_(k)

, and sending

y_(k)

to the server.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to better understand various exemplary embodiments, reference is made to the accompanying drawings, wherein:

FIG. 1 illustrates an exemplary protocol using additively homomorphic encryption (e.g., Paillier cryptosystem) for performing step 5b; and

FIG. 2 illustrates an exemplary protocol using somewhat homomorphic encryption (e.g., Boneh-Goh-Nissim cryptosystem) for performing step 5b.

To facilitate understanding, identical reference numerals have been used to designate elements having substantially the same or similar structure and/or substantially the same or similar function.

DETAILED DESCRIPTION

The description and drawings illustrate the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the invention and are included within its scope. Furthermore, all examples recited herein are principally intended expressly to be for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Additionally, the term, “or,” as used herein, refers to a non-exclusive or (i.e., and/or), unless otherwise indicated (e.g., “or else” or “or in the alternative”). Also, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.

Privacy-preserving data mining has gained a lot of attention in the last decade. The main goal of these methods is to analyze and extract useful information from the data of the users without accessing the individual's private data. One commonly used machine learning technique is the decision tree. Decision trees are simple classifiers that consist of a collection of decision nodes in a tree structure. It is particularly useful as it can be easily implemented in query-based systems and databases.

In this disclosure, an embodiment of a new protocol to evaluate decision trees in a secure way is disclosed. It is assumed that there is a client with input data x∈

^(n) who wants to evaluate the decision tree

:

^(n)→

. The goal is for the client to learn the output of the decision tree while the server learns nothing about the client's private input.

An embodiment of a privacy preserving decision tree protocol will be described herein that decreases the computational complexity as well as the required bandwidth as compared to the current state-of-the-art. In the previous works in the literature, in order to have privacy for the user's data, the server should implement the comparison protocol for all the internal nodes. The embodiment described here is a new method for decision tree evaluation which only requires one comparison in each level of the decision tree. This results in decreasing the total number of comparisons by a logarithmic factor. This is a worthwhile improvement as performing the comparison protocol usually requires massive amounts of computation and bandwidth.

The new embodiment of the decision tree protocol utilizes an additively homomorphic encryption scheme. Let

m

denote the encryption of a message m. The homomorphic property implies that for any two messages m and m′, the encryption of m+m′ can be obtained from the encryptions of m and m′ as

m+m′

=

m

·

m′

. Likewise, for a known constant c, the encryption of c·m can be obtained from the encryption of m as

c m

=

m

^(c).

An efficient additively homomorphic encryption scheme is provided by Paillier cryptosystem. Paillier scheme is defined as follows. On input some security parameter, two large primes p and q are generated. Let N=pq and λ=lcm (p−1, q−1). The public key is pk=N and the secret key is sk=λ. The message space is

={0, . . . , N−1}.

The encryption of a message m∈

is given by C=(1+mN)r^(N) mod N² for some random element r∈{1, . . . , N−1}. Writing C=

m

, the decryption of C is obtained as

$m = \frac{L}{\lambda}$ mod N where L=(C^(λ)−1 mod N²)/N.

Given C₁=

m₁

and C₂=

m₂

, one can obtain the encryption of m₁+m₂ as

m₁+m₂

=C₁C₂r′^(N) (mod N²) for any non-zero integer r′.

Let the server possess the database x₁, x₂, . . . , x_(n), and the client has a selection index i∈{1, 2, . . . , n}. In a 1-out-of-n oblivious transfer protocol, the client learns only x_(i) and the server learns nothing.

In a secure comparison protocol there are two parties who want to compare their private data. It is assumed that one party has integer x and the other party has integer y in the clear and they want to compare these numbers (without revealing the value) to check whether x≤y or not. The final result of the comparison will be secretly shared between the two parties.

The DGK+ comparison protocol which was proposed by Damgård, Geisler, and Krøigaard will be used as an example. The setting is as follows. Alice possesses a private t-bit integer x=Σ_(i=0) ^(t−1)x_(i) 2^(i) and Bob possesses another private t-bit integer y=Σ_(i=0) ^(t−1)y_(i) 2^(i). The goal for Alice and Bob is to respectively obtain bits δ_(A) and δ_(B) such that δ_(A)⊕δ_(B)=[x≤y]. The protocol proceeds in four steps:

-   -   1. Bob encrypts the bits of y=Σ_(i=0) ^(t−1)y_(i) 2^(i) under         his public key and sends         y_(i)         , 0≤i≤t−1, to Alice.     -   2. Alice chooses uniformly at random a bit δ_(A)∈{0,1} and         defines s=1−2δ_(A). Alice also selects t+1 random invertible         scalars r_(i), −1≤i≤t−1.     -   3. Next, for t−1≥i≥0, Alice computes         c _(i)*         =(         s         ·         x _(i)         ·         y _(i)         ⁻¹·(Π_(j=i+1) ^(t−1)         x _(j) ⊕y _(j)         )³)^(r) ^(i) .     -   Finally, Alice computes         c ⁻¹*         =(         δ_(A)         ·Π_(j=0) ^(t−1)         x _(j) ⊕y _(j)         )^(r) ⁻¹ .     -   Alice sends the t+1 ciphertexts         c_(i)*         in a random order to Bob.     -   4. Using his private key, Bob decrypts the received         c_(i)*         's. If one is decrypted to zero, Bob sets δ_(B)=1. Otherwise, he         sets δ_(B)=0.

This DGK+ comparison protocol has been improved upon by introducing a new method that reduces both the computational complexity and the communication bandwidth by a factor of two. This comparison protocol is described in copending patent application ser. No. 15/849,420 entitled “PRIVACY PRESERVING COMPARISON” filed Dec. 20, 2017. This improved version along with other privacy preserving comparison protocols may be used in the embodiment described in this disclosure.

Decision tree evaluation is a well-known method in the machine learning community to evaluate the model's output corresponding to a user's input data. In order to evaluate the decision tree

on the input vector x∈

_(n), when everything is available in the clear, one should traverse

by doing one comparison for each of the levels of the tree, and finding the leaf node corresponding to x. The value of this leaf node is the output of the model. As one can see, the total number of comparisons in this case is bounded by the depth of the tree

, i.e., the length of the longest path from the root to the leaves.

In David J Wu, Tony Feng, Michael Naehrig, and Kristin Lauter, “Privately evaluating decision trees and random forests,” Proceedings on Privacy Enhancing Technologies, 2016(4):335-355, 2016, Wu et al. introduced a method for privately evaluating decision trees and random forests. They have proposed a method in which a client can learn the output corresponding to her input value while the server learns nothing. They also showed that by using their protocol the client's data and the server's model remain private. They compared the performance of their secure protocol with the previous works in the literature and showed a 10 times improvement in terms of computations and bandwidth.

In spite of the fact that the method proposed by Wu et al. is secure and has much better performance than the prior art, they still need to perform the comparison protocol for all the internal nodes of the tree. In other words, when the decision tree

is a complete binary tree of depth d, in their protocol they need to perform 2^(d)−1 comparisons to securely compute the output of the model. However, as discussed earlier, when the input and the model are available in the clear one should perform only d comparisons to evaluate the decision tree.

In this disclosure, an embodiment is described for privately evaluating the decision tree with only d comparisons. This embodiment uses the same idea as Wu et al. to permute the tree nodes, whereas it needs to do only one comparison for each level of the decision tree. This is done by letting the client learn the indices of the corresponding internal nodes which appear while traversing the permuted tree. Having the knowledge of the index of the internal node at each level will result in doing one comparison in each level. Note that knowing the indices of the internal nodes in the permuted tree does not reveal any information about the actual decision tree because the nodes at each level are permuted using a random permutation.

Because the comparison protocol requires large amount of computation and bandwidth, our protocol greatly improves previous results in terms of computation and speed, by reducing the total number of comparisons to a logarithmic factor.

The embodiment described herein is fully generic in terms of the encryption scheme and the comparison protocol. Indeed, one can use the embodiment for evaluating a decision tree with the freedom of choosing any encryption scheme that is additively homomorphic, as well as choosing any comparison protocol for comparing the private data of the client and the user.

It is assumed that the server has a decision tree

:

^(n)→

and the client has a feature vector x∈

^(n). It is assumed that

is a complete binary tree of depth d. The depth of a tree is the length of the longest path from the root to the leaf. In general, a binary decision tree may not be complete but one can transform any decision tree to a complete tree by introducing dummy internal nodes.

The following rule is used for indexing the nodes of the decision tree

with depth d:

-   -   The nodes at level         in the tree are the nodes that have distance         from the root of the tree. Therefore, all the internal nodes in         have level between 0 and d−1;     -   The         nodes at level         (for         =0, 1, . . . , d−1) are indexed from the left to the right by         , with 0≤k≤         −1, where         denotes the leftmost node at level         and         denotes the rightmost node; and     -   An index from 0 to 2^(d)−1 is defined for the leaf nodes. With         this indexing scheme, the leaves of the tree, when read from         left-to-right, correspond with the ordering z₀, . . . , z₂ _(d)         ⁻¹.

Each internal node

in the tree is associated with a Boolean function

(x)=1 when {

≤

}, where

is an index in the feature vector x∈

^(n), and

is a threshold. To evaluate the output of the decision tree, one should start from the root and at each level depending on the result of

take either the left branch (when for example

=0) or the right branch (when for example

=1) of the tree, and repeat the process until a leaf node is reached. The output

(x) is z_(r), the value of the so-obtained leaf node.

In order to find the index of the leaf node, r, for the feature vector x, it is necessary and sufficient to know the result of the Boolean comparison at each level of the tree. The embodiment described herein is therefore optimal in the number of comparisons.

It is assumed that the client has a matching public/private key pair (pk, sk) which is used for encryption and decryption of the messages under an additively-homomorphic encryption scheme as described above. Also, as described above,

m

is used to denote an encryption of the message m under client's public-key, pk. It is also assumed that the server uses a copied version of the decision tree

, denoted by

′ and performs the permutation on that. An embodiment of the decision tree protocol proceeds as follows:

-   -   1. The client encrypts entries of the feature vector x=(x₁, . .         . , x_(n))∈         ^(n) and sends         x         =(         x₁         , . . . ,         x_(n)         ) to the server.     -   2. The server defines         ′←         . It chooses a random mask μ₀ in the message space and sends         m₀         =         x_(i) ₀ ₍₀₎ −t₀ ⁽⁰⁾+μ₀         to the client. Client recovers m₀ using private key sk.     -   3. The client and server perform comparison protocol on m₀ and         μ₀ and share the result. At the end of the protocol, the client         possesses b′₀∈{0,1} and the server has b₀∈{0,1} such that:         b ₀ ⊕b′ ₀=1{m ₀≤μ₀}=1{x _(i) ₀ ₍₀₎ ≤t ₀ ⁽⁰⁾}.     -   4. The server chooses a bit s₀∈{0,1} uniformly at random. If         s₀=1, server switches the left and right subtree of         ′, and calls         ′ the resulting tree. Server then sends b₀⊕s₀ to the client,         that in turn recovers β₀=b′₀⊕b₀⊕s₀.     -   5. For         =1, 2, . . . , d−1:         -   (a) The client defines             =(β₀, β₁, . . . ,             )₂:=             . For k=0, 1, . . . ,             −1, it sets y_(k)=1{             =k} and sends             y_(k)             to server. Note that this definition implies that             =1 and y_(k)=0 for k≠             .         -   (b) The server and client engage in a multi-party             computation protocol and secret share the result of the             comparison at level             . At the end of the protocol, the client possesses             ∈{0,1} and the server has             ∈{0,1} such that:

${b_{\ell} \oplus b_{\ell}^{\prime}} = {1{\left\{ {{\sum\limits_{k = 0}^{2^{\ell} - 1}{y_{k}\left( {x_{i_{k}^{(\ell)}} - t_{k}^{(\ell)}} \right)}} \leq 0} \right\}.}}$

-   -   -   (c) The server chooses a bit             ∈{0,1} uniformly at random. If             =1, the server switches all the left and right subtrees at             level             of             ′, and calls             ′ the resulting tree. The server sends             ⊕             to the client, that in turn recovers             =             ⊕             ⊕             .

    -   6. The client computes r=(β₀, β₁, . . . , β_(d−1))₂ which is the         index of the leaf node in         ′. Next, the client engages in a 1-out-of-2^(d) oblivious         transfer with the server to learn the value of         (x).

In step 5a in the embodiment described above, letting y=(y₀, . . . ,

⁻¹), it turns out that

$\begin{matrix} {y = \left( {0,\ldots\mspace{14mu},0,1,0,\ldots\mspace{14mu},0} \right)} \\  \uparrow \\ {\beta^{(\ell)} - {{th}\mspace{14mu}{position}}} \end{matrix}$ namely, y has a single bit set to 1. As a result, in step 5a, the following results:

${\sum\limits_{k = 0}^{2^{\ell} - 1}{y_{k}\left( {x_{i_{k}^{(\ell)}} - t_{k}^{(\ell)}} \right)}} = {{x_{i_{k^{*}}^{(\ell)}} - {t_{k^{*}}^{(\ell)}\mspace{14mu}{where}\mspace{14mu} k^{*}}} = \beta^{(\ell)}}$ and thus

${b_{\ell} \oplus b_{\ell}^{\prime}} = \left\{ {\begin{matrix} 1 & {{{if}\mspace{14mu} x_{i_{k^{*}}^{(\ell)}}} \leq t_{k^{*}}^{(\ell)}} \\ 0 & {otherwise} \end{matrix}.} \right.$

The advantage of this approach is that only a single comparison is needed as opposed to the approach of Wu et al. where

comparisons are performed at level

. Indeed, at level

, their proposed method requires the evaluation of 1{

≤

} for 0≤k≤

−1.

Now two detailed implementations the multi-party protocol for step 5b will be presented. The first embodiment makes use of additively homomorphic encryption, and the second embodiment makes use of somewhat homomorphic encryption.

FIG. 1 illustrates an exemplary protocol using additively homomorphic encryption (e.g., Paillier cryptosystem) for performing step 5b.

The steps of this protocol are as follows: After the client computes y_(k), for 0≤k≤

−1, and the server receives inputs

,

, and

y_(k)

, for 0≤k≤

−1, in step 1, the server chooses

random masks

for 0≤k≤

−1, and a random mask

. In step 2, the server defines and computes

=

. In step 3, the server computes for 0≤k≤

−1

where

=

−

+

. In step 4, the server sends

and

, . . . ,

to the client.

In step 5, the client decrypts

to get

. In step 6, the client sets

=

−

. In step 7, the client and server engage in a comparison protocol (e.g., the DGK+ protocol) on input

for the client and

for the server. The client then outputs

and the server outputs

. These output values can then be used as described above in steps 5c and 6 in the decision tree protocol.

Note that

and

may be computed by the client from

,

, and

y_(k)

as they satisfy

${〚M_{\ell}〛} = {\left( {\prod\limits_{k = 0}^{2^{\ell} - 1}{〚y_{k}〛}^{r_{k}^{(\ell)}}} \right)〚\mu_{\ell}〛}^{- 1}$ and〚z_(k)^((ℓ))〛 = 〚x_(i_(k)^((ℓ)))〛(〚t_(k)^((ℓ))〛)⁻¹〚r_(k)^((ℓ))〛.

The second embodiment of step 5b that makes use of somewhat homomorphic encryption will now be described. Somewhat homomorphic encryption allows anyone to add encrypted messages as in additively homomorphic encryption. Advantageously it also allows anyone to multiply encrypted messages but just once. An example of such a scheme is the BGN (Boneh-Goh-Nissim) cryptosystem, whose description follows.

On input some security parameter, let

and

_(T) be two cyclic groups of order n=q₁q₂ where q₁ and q₂ are prime, equipped with a bilinear map e:

×

→

_(T). Let also g, u←

be two random elements in

and h=u^(q) ² . The public key is pk=(n,

,

_(T), e, g, h) and the secret key is sk=q₁. The message space

is the set {0, 1, . . . , T} with T<q₂.

The encryption of a message m∈

is given by C=g^(m)h^(r)∈

for some random element r∈{0, . . . , n−1}. Define C=

m

. Noting that C^(q) ¹ =(g^(m)h^(r))^(q) ¹ =(g^(q) ¹ )^(m), the decryption of C is obtained as the discrete logarithm of C^(q) ¹ with respect to base g^(q) ¹ .

Next, set G=e(g, g) and H=e(g, h). There is another way to define the encryption of a message m∈

. Choose some random element r∈{0, . . . , n−−1} and define the encryption of m as Ĉ=G^(m)H^(r)∈

_(T). Define Ĉ=<<m>>. Plaintext message m can then be recovered as the discrete logarithm of Ĉ^(q) ¹ with respect to base G^(q) ¹ using secret key q₁.

For the first encryption scheme encrypted messages may be added as follows: Given C₁=

m₁

and C₂=

m₂

, anyone can obtain the encryption of m₁+m₂ as

m₁+m₂

=C₁C₂n^(r′) for any integer r′.

For the second encryption scheme encrypted messages may be added as follows: Given

=<<m₁>> and

=<<m₂>>, anyone can obtain the encryption of m₁+m₂ as <<m₁+m₂>>=

H^(r′) for any integer r′.

Encrypted messages may be multiplied as follows: Given C₁=

m₁

and C₂=

m₂

, anyone can obtain the encryption of m₁·m₂ as <<m₁·m₂>>=e(C₁, C₂)H^(r′) (for any integer r′). It can be verified that e(C₁,C₂)H^(r′)=G^(m) ¹ ^(m) ² H^({tilde over (r)}) for some {tilde over (r)}.

Assuming using a somewhat homomorphic encryption, the multi-party computation protocol for step 5b may proceed as depicted in FIG. 2.

After the client computes y_(k), for 0≤k≤

−1, and the server receives inputs

,

, and

y_(k)

, for 0≤k≤

−1, in step 1, the server chooses a random mask

. In step 2, the server computes <<

>> where

=

+

(

−

). In step 3, the server sends <<

>> to the client.

In step 4, the client decrypts <<

>> to get

. In step 5, the client and server engage in a comparison protocol (e.g., the DGK+ protocol) on input

for the client and

for the server. The client then outputs

and the server outputs

. These output values can then be used as described above in steps 5c and 6 in the decision tree protocol.

Note that <<

>> may be computed by the client from

,

, and

as it satisfies

$\left\langle \left\langle m_{\ell} \right\rangle \right\rangle = {\left\langle \left\langle \mu_{\ell} \right\rangle \right\rangle{\prod\limits_{k = 0}^{2^{\ell} - 1}{{e\left( {{〚y_{k}〛},{{〚x_{i_{k}^{(\ell)}}〛}〚t_{k}^{(\ell)}〛}^{- 1}} \right)}.}}}$

Decision trees have various applications in different fields including object recognition, molecular biology, and financial analysis. The embodiments described herein describe a new protocol for decision-tree evaluation and guarantees the privacy of both the user's information and the server's model.

These embodiments have potential applications in the emerging field of cloud computing. In a cloud-based query system, the service provider possesses a model which is developed by integrating the data of thousands of users and the client wants to learn the output of the model for her input data. Currently, such services require having access to the user's information in the clear. However, this information may be very sensitive in certain cases (such as medical data). The embodiments described herein can be a good alternative to be implemented for these applications in a privacy-preserving way.

The embodiments described herein represent an improvement in the technology of the secure evaluation of a decision tree by a party who does not have access to the underlying secure data and another party who does not have access to the specifics of the decision tree. These embodiments provide a reduction in the amount of computations needed to evaluate the decision trees as well as reducing the amount of data needed to be exchanged between the parties engage in evaluating the decision tree. As a result, the embodiments also lead to an improvement in the operation of a computer that may be used to carry out such secure decision tree evaluations.

The methods described above may be implemented in software which includes instructions for execution by a processor stored on a non-transitory machine-readable storage medium. The processor may include a memory that stores the instructions for execution by the processor.

Any combination of specific software running on a processor to implement the embodiments of the invention, constitute a specific dedicated machine.

As used herein, the term “non-transitory machine-readable storage medium” will be understood to exclude a transitory propagation signal but to include all forms of volatile and non-volatile memory. Further, as used herein, the term “processor” will be understood to encompass a variety of devices such as microprocessors, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and other similar processing devices. When software is implemented on the processor, the combination becomes a single specific machine.

It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention.

Although the various exemplary embodiments have been described in detail with particular reference to certain exemplary aspects thereof, it should be understood that the invention is capable of other embodiments and its details are capable of modifications in various obvious respects. As is readily apparent to those skilled in the art, variations and modifications can be effected while remaining within the spirit and scope of the invention. Accordingly, the foregoing disclosure, description, and figures are for illustrative purposes only and do not in any way limit the invention, which is defined only by the claims. 

What is claimed is:
 1. A method for performing a secure evaluation of a decision tree, comprising: receiving, by a processor of a server, an encrypted feature vector

x

=(

x₁

, . . . ,

x_(n)

) from a client, where n is an integer, wherein

x

denotes an additively homomorphic encryption of x using a cryptographic key of the second party; choosing a random mask μ₀; calculating

m₀

and sending

m₀

to the client, wherein

m₀

=

x_(i) ₀ ₍₀₎ −t₀ ⁽⁰⁾+μ₀

and x_(i) ₀ ₍₀₎ is the value of entry i₀ ^((o)) of the feature vector x (1≤i₀ ⁽⁰⁾≤n), and t₀ ⁽⁰⁾ is a threshold value in the first node in the first level of a decision tree

′; performing a comparison protocol on m₀ and μ₀, wherein the server produces a comparison bit b₀ and the client produces a comparison bit b′₀; choosing a random bit s₀∈{0,1} and when s₀=1 switching a left and right subtrees of

′; sending b₀⊕s₀ to the client; and for each level

=1,2, . . . , d−1 of the decision tree

′, where d is the number of levels in the decision tree

′, perform the following steps: receiving from the client

y_(k)

where k=0,1, . . . ,

−1; performing a comparison protocol on

and

, wherein

is a random mask and

is based upon,

x

,

,

y_(k)

, and

and the server produces a comparison bit

and the client produces a comparison bit

; choosing a random bit

∈{0,1} and when

=1 switching all left and right subtrees at level

of

′; and sending

⊕

to the client.
 2. The method of claim 1, wherein when the server engages in a 1-out-of-2^(d) oblivious transfer with the client to learn the value of

′(x).
 3. The method of claim 2, wherein when the client computes r=(β₀, β₁, . . . , β_(d−1))₂ which is the index of the leaf node in

′ indicating the output of the decision tree and where

=

⊕

⊕

.
 4. The method of claim 1, wherein the first encryption uses the Pallier cryptosystem.
 5. The method of claim 1, wherein performing a comparison protocol on

and

, further comprises: choosing, by the server,

random masks

for 0≤k≤

−1, and a random mask

; computing

=

; computing for 0≤k≤

−1

where

=

−

+

; and sending

and

, . . . ,

to the client.
 6. The method of claim 5, wherein performing a comparison protocol on

and

, further comprises: decrypting, by the client,

to get

; and computing, by the client,

=

−

.
 7. The method of claim 1, wherein performing a comparison protocol on

and

, further comprises: choosing, by the server, a random mask

; computing <<

>> where

=

+

(

−

), <<w>> denotes a second homomorphic encryption of w using a cryptographic key of the second party; and sending <<

>> to the client.
 8. The method of claim 7, wherein performing a comparison protocol on

and

, further comprises, decrypting, by the client, <<

>> to get

.
 9. The method of claim 7, wherein the second encryption uses the Boneh-Goh-Nissim (BGN) cryptosystem.
 10. The method of claim 7, wherein the second encryption uses a homomorphic cryptosystem.
 11. The method of claim 1, wherein for each level

=1,2, . . . , d−1 of the decision tree

′ and the client computes y_(k)=1{

=k}, where

=

, encrypts y_(k) resulting in

y_(k)

, and sending

y_(k)

to the server.
 12. A non-transitory machine-readable storage medium encoded with instructions for performing a secure evaluation of a decision tree, comprising: instructions for receiving, by a processor of a server, an encrypted feature vector

x

=(

x₁

, . . . ,

x_(n)

) from a client, where n is an integer, wherein

x

denotes an additively homomorphic encryption of x using a cryptographic key of the second party; instructions for choosing a random mask μ₀; instructions for calculating

m₀

and sending

m₀

to the client, wherein

m₀=x_(i) ₀ ₍₀₎ −t₀ ⁽⁰⁾+μ₀

and x_(i) ₀ ⁽⁰⁾ is the value of entry i₀ ⁽⁰⁾ of the feature vector x (1≤i₀ ⁽⁰⁾≤n), and t₀ ⁽⁰⁾ is a threshold value in the first node in the first level of a decision tree

′; instructions for performing a comparison protocol on m₀ and μ₀, wherein the server produces a comparison bit b₀ and the client produces a comparison bit b′₀; instructions for choosing a random bit s₀∈{0,1} and when s₀=1 switching a left and right subtrees of

′; instructions for sending b₀⊕s₀ to the client; and for each level

=1, 2, . . . , d−1 of the decision tree

′, where d is the number of levels in the decision tree

′, perform the following instructions: instructions for receiving from the client

y_(k)

where k=0,1, . . . ,

−1; instructions for performing a comparison protocol on

and

, wherein

is a random mask and

is based upon

x

,

,

y_(k)

, and

and the server produces a comparison bit

and the client produces a comparison bit

; instructions for choosing a random bit

∈{0,1} and when

=1 switching all left and right subtrees at level

of

′; and instructions for sending

⊕

to the client.
 13. The non-transitory machine-readable storage medium of claim 12, wherein when the server engages in a 1-out-of-2^(d) oblivious transfer with the client to learn the value of

′(x).
 14. The non-transitory machine-readable storage medium of claim 13, wherein when the client computes r=(β₀, β₁, . . . , β_(d-1))₂ which is the index of the leaf node in

′ indicating the output of the decision tree and where

=

⊕

⊕

.
 15. The non-transitory machine-readable storage medium of claim 12, wherein the first encryption uses the Pallier cryptosystem.
 16. The non-transitory machine-readable storage medium of claim 12, wherein instructions for performing a comparison protocol on

and

, further comprises: instructions for choosing, by the server,

random masks

for 0≤k≤

−1, and a random mask

; instructions for computing

=

; instructions for computing for 0≤k≤

−1

where

=

−

+

; and instructions for sending

and

, . . . ,

to the client.
 17. The non-transitory machine-readable storage medium of claim 16, wherein instructions for performing a comparison protocol on

and

, further comprises: instructions for decrypting, by the client,

to get

; and instructions for computing, by the client,

=

−

.
 18. The non-transitory machine-readable storage medium of claim 12, wherein instructions for performing a comparison protocol on

and

, further comprises: instructions for choosing, by the server, a random mask

; instructions for computing <<

>> where

=

+

(

−

), <<w>> denotes a second homomorphic encryption of w using a cryptographic key of the second party; and instructions for sending <<

>> to the client.
 19. The non-transitory machine-readable storage medium of claim 18, wherein instructions for performing a comparison protocol on

and

, further comprises, instructions for decrypting, by the client, <<

>> to get

.
 20. The non-transitory machine-readable storage medium of claim 18, wherein the second encryption uses the Boneh-Goh-Nissim (BGN) cryptosystem.
 21. The non-transitory machine-readable storage medium of claim 18, wherein the second encryption uses a homomorphic cryptosystem.
 22. The non-transitory machine-readable storage medium of claim 12, wherein for each level

=1,2, . . . , d−1 of the decision tree

′ the client performs instructions for computes y_(k)=1{

=k}, where

=

, instructions for encrypts y_(k) resulting in

y_(k)

, and instructions for sending

y_(k)

to the server. 